DNA sequence assembly
DNA BASER-The sequence assembler-Home pageFeatures and performancesScreen shotsPricesInfo and news.Download a full working versionContact us
molecular biology software
scf trace assembly

Sequence recognition for automatic vector removal

with DNA Sequence Assembler

 

 

 

 

 

Content


1. General info about the pGEM-T eazy vector
2. Designing recognition sequences for the pGEM-T eazy vector
3. Using recognition sequences with DNA Sequence Assembler (tutorial illustrating that the recognition sequences we designed at step 2 are functioning)

 

 

1. General info about the pGEM-T eazy vector

 

Below we are showing the sequence of the pGEM(R)-T easy vector. The vector is delivered in a linear form, to facilitate the ligation of the insert. The pGEM(R)-T Easy Vector has been linearized with EcoRV at Base 60 of this sequence (indicated by an asterisk *) and a T added to both 3'-ends (the added T is not included in the sequence of the vector given below). Therefore, the insert will be ligated at position 60, where the asterisk * is found.

 

>   pGEM-T Easy Vector (Promega corporation)

   1  GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG
  51  GGAATTCGAT* ATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT
 101  GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC
 151  ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT
 201  TGTTATCCGC TCACAATTCC ACACAACATA CGAGCCGGAA GCATAAAGTG
 251  TAAAGCCTGG GGTGCCTAAT GAGTGAGCTA ACTCACATTA ATTGCGTTGC
 301  GCTCACTGCC CGCTTTCCAG TCGGGAAACC TGTCGTGCCA GCTGCATTAA
 351  TGAATCGGCC AACGCGCGGG GAGAGGCGGT TTGCGTATTG GGCGCTCTTC
 401  CGCTTCCTCG CTCACTGACT CGCTGCGCTC GGTCGTTCGG CTGCGGCGAG
 451  CGGTATCAGC TCACTCAAAG GCGGTAATAC GGTTATCCAC AGAATCAGGG
 501  GATAACGCAG GAAAGAACAT GTGAGCAAAA GGCCAGCAAA AGGCCAGGAA
 551  CCGTAAAAAG GCCGCGTTGC TGGCGTTTTT CCATAGGCTC CGCCCCCCTG
 601  ACGAGCATCA CAAAAATCGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA
 651  GGACTATAAA GATACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC
 701  TCCTGTTCCG ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT
 751  CGGGAAGCGT GGCGCTTTCT CATAGCTCAC GCTGTAGGTA TCTCAGTTCG
 801  GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT GTGCACGAAC CCCCCGTTCA
 851  GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG TCCAACCCGG
 901  TAAGACACGA CTTATCGCCA CTGGCAGCAG CCACTGGTAA CAGGATTAGC
 951  AGAGCGAGGT ATGTAGGCGG TGCTACAGAG TTCTTGAAGT GGTGGCCTAA
1001  CTACGGCTAC ACTAGAAGAA CAGTATTTGG TATCTGCGCT CTGCTGAAGC
1051  CAGTTACCTT CGGAAAAAGA GTTGGTAGCT CTTGATCCGG CAAACAAACC
1101  ACCGCTGGTA GCGGTGGTTT TTTTGTTTGC AAGCAGCAGA TTACGCGCAG
1151  AAAAAAAGGA TCTCAAGAAG ATCCTTTGAT CTTTTCTACG GGGTCTGACG
1201  CTCAGTGGAA CGAAAACTCA CGTTAAGGGA TTTTGGTCAT GAGATTATCA
1251  AAAAGGATCT TCACCTAGAT CCTTTTAAAT TAAAAATGAA GTTTTAAATC
1301  AATCTAAAGT ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGCTTAA
1351  TCAGTGAGGC ACCTATCTCA GCGATCTGTC TATTTCGTTC ATCCATAGTT
1401  GCCTGACTCC CCGTCGTGTA GATAACTACG ATACGGGAGG GCTTACCATC
1451  TGGCCCCAGT GCTGCAATGA TACCGCGAGA CCCACGCTCA CCGGCTCCAG
1501  ATTTATCAGC AATAAACCAG CCAGCCGGAA GGGCCGAGCG CAGAAGTGGT
1551  CCTGCAACTT TATCCGCCTC CATCCAGTCT ATTAATTGTT GCCGGGAAGC
1601  TAGAGTAAGT AGTTCGCCAG TTAATAGTTT GCGCAACGTT GTTGCCATTG
1651  CTACAGGCAT CGTGGTGTCA CGCTCGTCGT TTGGTATGGC TTCATTCAGC
1701  TCCGGTTCCC AACGATCAAG GCGAGTTACA TGATCCCCCA TGTTGTGCAA
1751  AAAAGCGGTT AGCTCCTTCG GTCCTCCGAT CGTTGTCAGA AGTAAGTTGG
1801  CCGCAGTGTT ATCACTCATG GTTATGGCAG CACTGCATAA TTCTCTTACT
1851  GTCATGCCAT CCGTAAGATG CTTTTCTGTG ACTGGTGAGT ACTCAACCAA
1901  GTCATTCTGA GAATAGTGTA TGCGGCGACC GAGTTGCTCT TGCCCGGCGT
1951  CAATACGGGA TAATACCGCG CCACATAGCA GAACTTTAAA AGTGCTCATC
2001  ATTGGAAAAC GTTCTTCGGG GCGAAAACTC TCAAGGATCT TACCGCTGTT
2051  GAGATCCAGT TCGATGTAAC CCACTCGTGC ACCCAACTGA TCTTCAGCAT
2101  CTTTTACTTT CACCAGCGTT TCTGGGTGAG CAAAAACAGG AAGGCAAAAT
2151  GCCGCAAAAA AGGGAATAAG GGCGACACGG AAATGTTGAA TACTCATACT
2201  CTTCCTTTTT CAATATTATT GAAGCATTTA TCAGGGTTAT TGTCTCATGA
2251  GCGGATACAT ATTTGAATGT ATTTAGAAAA ATAAACAAAT AGGGGTTCCG
2301  CGCACATTTC CCCGAAAAGT GCCACCTGAT GCGGTGTGAA ATACCGCACA
2351  GATGCGTAAG GAGAAAATAC CGCATCAGGA AATTGTAAGC GTTAATATTT
2401  TGTTAAAATT CGCGTTAAAT TTTTGTTAAA TCAGCTCATT TTTTAACCAA
2451  TAGGCCGAAA TCGGCAAAAT CCCTTATAAA TCAAAAGAAT AGACCGAGAT
2501  AGGGTTGAGT GTTGTTCCAG TTTGGAACAA GAGTCCACTA TTAAAGAACG
2551  TGGACTCCAA CGTCAAAGGG CGAAAAACCG TCTATCAGGG CGATGGCCCA
2601  CTACGTGAAC CATCACCCTA ATCAAGTTTT TTGGGGTCGA GGTGCCGTAA
2651  AGCACTAAAT CGGAACCCTA AAGGGAGCCC CCGATTTAGA GCTTGACGGG
2701  GAAAGCCGGC GAACGTGGCG AGAAAGGAAG GGAAGAAAGC GAAAGGAGCG
2751  GGCGCTAGGG CGCTGGCAAG TGTAGCGGTC ACGCTGCGCG TAACCACCAC
2801  ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTCCATT CGCCATTCAG
2851  GCTGCGCAAC TGTTGGGAAG GGCGATCGGT GCGGGCCTCT TCGCTATTAC
2901  GCCAGCTGGC GAAAGGGGGA TGTGCTGCAA GGCGATTAAG TTGGGTAACG
2951  CCAGGGTTTT CCCAGTCACG ACGTTGTAAA ACGACGGCCA GTGAATTGTA
3001  ATACGACTCA CTATA

 

 

 

2. How to design recognition sequences

 

 

Step I: Identify the base positions where the insert will be ligated

 

In our example, it is position 60 (marked by an asterisk *):

 

>   pGEM-T Easy Vector (Promega corporation)

   1  GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG
  51  GGAATTCGAT* ATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT
 101  GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC
 151  ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

 

 

Step II: add a T at the end of each flanking region (T marked in red)

Because the T is added at the 3' end, before the asterisk (which is the 3' end of the given DNA strand) we add a T and after the asterisk (which is 5' end of the given DNA strand) we add an A, corresponding with a T at the 3' end on the complementary strand.

 

>   pGEM-T Easy Vector (Promega corporation)

   1  GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG
  51  GGAATTCGATT* AATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT
 101  GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC
 151  ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

 

 

Step III:

Identify the F recognition sequence by selecting 15-20 bases in the 1st flanking  region (just before the asterisk, marked below in lime color)

 

>   pGEM-T Easy Vector (Promega corporation)

   1  GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG

  51  GGAATTCGATT* AATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT

 101  GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC

 151  ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

 

The F recognition sequence is: GGCCGCGGGAATTCGATT

 

 

Step IV:

Select 15-20 bases in the 2nd flanking region (just after the asterisk) (marked below in green)

 

>   pGEM-T Easy Vector (Promega corporation)

   1  GGGCGAATTG GGCCCGACGT CGCATGCTCC CGGCCGCCAT GGCGGCCGCG

  51  GGAATTCGATT* AATCACTAGTG AATTCGCGGC CGCCTGCAGG TCGACCATAT

 101  GGGAGAGCTC CCAACGCGTT GGATGCATAG CTTGAGTATT CTATAGTGTC

 151  ACCTAAATAG CTTGGCGTAA TCATGGTCAT AGCTGTTTCC TGTGTGAAAT

 

The selected bases in the 2nd flanking region are: AATCACTAGTG AATTCGC.

 

 

Step V:

Visually check that the F recognition sequence is different from the selected bases in the 2nd flanking region

 

The F recognition sequence is:  GGCCGCGGGAATTCGATT
The selected bases in the 2nd flanking region are: AATCACTAGTG AATTCGC.

If different, move to step VI. If not different, repeat steps III and IV, but this time select longer sequences.

 

 

Step VI:

Obtain the R recognition sequence (marked in fuchsia) by making the reverse complement of the selected bases from the 2nd flanking region (step IV)

 

The selected bases in the 2nd flanking region are: AATCACTAGTG AATTCGC.
The R recognition sequence = The reverse complement of the selected bases in the 2nd flanking region = GCGAATT CACTAGTGATT

 

 

Step VII:

Visually check that the F and the R recognition sequences are not identical

 

The F recognition sequence is: GGCCGCGGGAATTCGATT
The R recognition sequence is: GCGAATTCACTAGTGATT
If different, you are done. If not different, repeat steps III to VI, but this time select longer sequences.

 

 

 

 

3. How to input and use the recognition sequence with DNA Baser Assembler

 

Step I: Define your vectors

 

Click the Tasks button to open the 'Tasks' panel

In the 'Tasks' panel, chose the desired task from ''Sequence processing' or 'Mutation detection' section

 

Task manager. General info about the pGEM-T eazy vector

 

The PROJECT MANAGER window will open.

Click the 'Vector Removal' tab:

Design recognition sequences for the pGEM-T eazy vector

 

In the Vector Removal tab you will be able to enter your vector recognition sequence(s).

In the "Add new recognition sequence" box, enter the name and the nucleotides of the F recognition sequence file.

Press the ADD button to add the sequence into the 'Current recognition sequences' list:


Sequence Recognition & Vector Removal tab

In the "Vector cleaning" box choose if you want to remove or to keep the recognition sequence when is found. The vector will be removed in both cases, just the recognition sequence will be kept if so is chosen by the user).

Cut or keep recognition  sequence

 

Repeat operation for the R recognition sequence. Make sure that both recognition sequences are active (the check box in front of them is checked):

pGem-T Easy vector in 'Current recognition sequences'

Press the APPLY button to save the settings.

 

Note: For details about vector removal and sequence recognition, click here to go to the Vector Removal page.

 

 

Step II: Sequence assembly

 

Click the PROJECT BUILDER tab, navigate to the folder containing your sequences and add them into the JOB LIST.

Now you are ready to start the sequence assembly (or mutation detection) by pressing the "Start sequence assembly" button.


details about vector removal and sequence recognition

 

Step III: Contig inspection

 

If the sequence assembly process was successful, the 'Assembly Window' will open. Here you can see if the recognition sequences were found and the vector was removed. The recognition sequences are marked in blue. The vector bases are strike. It will be automatically cut from your contig when you save the contig to disk.


Recognition sequences were found and the vector was removed

 

In the screenshot above, the recognition sequence was not removed, as per user choice (see Step I). If you want it to be removed, you need to select the 'Cut recognition sequence' option in the 'Vector Removal' tab, BEFORE assembling the sequences.

 

In this example, both recognition sequences were found:

 

Recognition sequences were found

 

 

 

 

Custom support

 

If you need custom support with tasks similar to this one (sequence assembly, vector removal, primer design, automation of sequence cleaning/processing jobs, etc) we can provide it at an affordable price.

 

 

Back to articles

DNA chromatogram assembly
contig assembly software
  Support         Online Manual